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1  Introduction 


Special  function  transforms  are  a  widely  used  and  well-understood  tool  of  applied  mathe¬ 
matics;  they  are  encountered,  inter  alia,  in  weather  and  climate  modeling  [19],  [20],  [21], 
tomography  [14],  [9],  electromagnetics  [13],  and  acoustics  [6],  [16].  Examples  of  such  trans¬ 
forms  include  the  Fourier  transform,  the  Fourier-Bessel  transform,  orthogonal  polynomial 
transforms,  etc. 

We  present  an  algorithm  for  the  numerical  computation  of  several  special  function  trans¬ 
forms.  Suppose  that  S'  is  a  change  of  basis  matrix  between  the  standard  basis  and  a  basis  of 
special  functions.  We  begin  by  compressing  each  of  the  submatrices  of  S  shown  in  Figure  1; 
we  refer  to  these  sub  matrices  as  Level  0.  Each  subsequent  level  consists  of  submatrices  ob¬ 
tained  from  the  previous  level  by  merging  horizontally  and  splitting  vertically.  Level  1  is 
illustrated  in  Figure  2  and  Level  2  is  illustrated  in  Figure  3.  In  the  last  level,  each  submatrix 
extends  across  an  entire  row  of  S,  as  illustrated  in  Figure  4.  We  compress  each  submatrix  at 
each  level.  We  then  use  the  compressed  submatrices  to  apply  the  matrix  S  to  an  arbitrary 
vector;  this  requires  0(n\og{n))  operations  to  apply  any  matrix  such  that  the  rank  of  any 
p  x  q  contiguous  sub  matrix  is  bounded  by  a  constant  times  pq/n.  We  prove  the  required  rank 
bounds  for  the  case  of  the  Fourier-Bessel  transform  in  Theorem  4.3.  Numerical  examples 
demonstrate  a  much  wider  applicability. 

In  addition  to  enabling  the  fast  application  of  certain  matrices,  the  algorithm  of  the 
present  paper  can  be  used  as  a  tool  for  matrix  compression.  The  n  x  n  matrices  examined 
are  compressed  using  approximately  0(n\og(n))  memory. 


Figure  1:  Level  0 

It  should  be  pointed  out  that  the  algorithm  of  this  paper  is  very  similar  to  that  of  [12]  and 
has  been  motivated  by  the  latter.  In  particular,  the  term  “butterfly  algorithm”  is  introduced 
in  [12],  due  to  its  similarity  to  the  butterfly  stage  in  the  Fast  Fourier  Transform  (FFT). 

It  is  not  the  purpose  of  this  paper  to  review  the  extensive  literature  on  the  subject  of 
algorithms  for  special  function  transforms.  For  a  detailed  survey  of  the  literature  we  refer 
the  reader  to  [15],  [17],  [22],  and  the  references  therein.  In  brief,  the  algorithm  described  in 
[17]  handles  associated  Legendre  functions,  spheroidal  wave  functions  of  all  orders,  associ¬ 
ated  Laguerre  functions,  and  the  Fourier-Bessel  transform.  The  algorithm  described  in  [22] 
handles  associated  Legendre  functions,  Hermite  functions,  associated  Laguerre  functions, 
the  Fourier-Bessel  transform,  and  sums  of  Bessel  functions  of  varying  orders.  The  algorithm 
described  in  [22]  is  much  preferable  to  that  of  [17]  in  most  circumstances. 
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Figure  3:  Level  2 


The  algorithm  of  the  present  article  is  quite  easy  to  implement,  and,  furthermore,  applies 
directly  to  Fourier-Bessel  series,  whereas  the  algorithms  of  [17]  and  [22]  do  not.  The  structure 
of  our  algorithm  is  notably  different.  This  paper  illustrates  our  algorithm  by  treating  the 
example  of  the  Fourier-Bessel  transform  in  full  detail.  We  also  illustrate  the  application 
of  the  butterfly  algorithm  to  a  wide  range  of  special  function  transforms  numerically,  in 
Section  6  below. 

The  present  paper  has  the  following  structure:  Section  2  sets  the  notation.  Section  3 
collects  various  known  facts  which  later  sections  utilize.  Section  4  provides  the  principal 
lemmas  which  Section  5  uses  to  construct  an  algorithm.  Section  5  describes  the  algorithm 
of  the  present  paper,  providing  details  about  its  computational  costs.  Section  6  illustrates 
the  performance  of  the  algorithm  via  several  numerical  examples.  Section  7  draws  several 
conclusions  and  discusses  possible  extensions. 

2  Notation 

We  define  R  to  be  the  set  of  all  real  numbers.  We  define  C  to  be  the  set  of  all  complex 
numbers.  Throughout  this  paper,  we  use  %  =  \/—T. 

For  a  real  number  x ,  we  define  \_x\  to  be  the  largest  integer  n  which  satisfies  n  <  x. 

For  any  positive  integer  l ,  we  define  we  define  1  to  be  the  real  l  x  l  matrix  whose  (j,  k ) 
entry  is  1  if  j  =  k ,  and  0  if  j  ^  k,  for  integers  j ,  k  such  that  1  <  j  <  l  and  1  <  k  <  l.  We 
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Figure  4:  The  last  level 


refer  to  1  as  the  identity  matrix  of  size  l. 

Suppose  that  a  and  b  are  real  numbers  and  that  f{x)  is  a  complex  valued  function, 
defined  for  every  real  number  x  such  that  a  <  x  <  b.  We  define  the  L2[a,  b]  norm  of  /  via 
the  formula 

II/IIm  =  J ^  \f(x)\2dx.  (2.1) 

We  define  L2[a,b]  to  be  the  set  of  all  complex  valued  functions  defined  on  the  interval  [a,  b] 
such  that  || /||  [«, 6]  <  oo. 

Suppose  that  a,  b,  u,  and  v  are  real  numbers  such  that  a  <  b  and  u  <  v.  Suppose  further 
that  A  :  L2[a,  b\  — >  L2[u,v]  is  an  integral  operator  with  kernel  k(x,t )  :  [a,  b]  x  [u,v]  — >  C 
given  by  the  formula 

(Af)(x)  =  f  k(x,  t)f  (x)  dx.  (2.2) 

J  a 

We  define  the  spectral  norm  of  the  kernel  k  or  the  operator  A  via  the  formula 


Pll 


2 


sup 

f£L2[a,b] 


\\Af\\[u,v} 

II/IIm 


If  v  is  an  n  x  1  vector  we  define  its  norm  ||u||  via  the  formula 

n 

IMI  = 

3= 1 


(2.3) 


(2.4) 


where  v3  is  the  jth  entry  in  v,  for  every  integer  j  such  that  l  <  j  <  n.  If  S  is  a  n  x  m  matrix, 
we  define  its  norm  || S'!!  via  the  formula 


\\S\\ 


max 

0^w£Rm 


(2.5) 
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3  Mathematical  preliminaries 

3.1  Facts  from  numerical  analysis 

3.1.1  Numerical  rank 

Suppose  that  m  is  a  postitive  integer  and  that  a,  b,  u ,  v ,  and  e  are  real  numbers  such  that 
a  <  b  and  u  <  v,  with  £  >  0.  Suppose  further  that  A  :  L2[a,b]  — >  L2[u,v]  is  the  integral 
operator  with  kernel  k  :  [a,  6]  x  [u,  v]  — >  C,  given  by  the  formula 

(Af)(x)  =  f  k(x,  t)f{x)  dx.  (3.1) 

J  a 

The  operator  A  in  (3.1)  is  defined  to  have  rank  m  to  precision  £  if  m  is  the  least  integer 
such  that  there  exist  2m  functions  gi,g2, ... ,  gm-i,  9m  and  hi,  h2,  ■  ■  ■ ,  hm_\ ,  hrn  satisfying 
II k(x,t)  -  Yxk=i9k{.x)hk{t)\\2  =  £. 

3.1.2  The  interpolative  decomposition 

The  following  lemma  states  that,  for  any  m  x  n  matrix  A  whose  rank  is  r,  there  exist  an 
m  x  r  matrix  B  whose  columns  constitute  a  subset  of  the  columns  of  A,  and  a  r  x  n  matrix 
P ,  such  that 

1.  some  subset  of  the  columns  of  P  makes  up  the  r  x  r  identity  matrix, 

2.  P  is  not  too  large,  and 

3.  BP  =  A. 

Moreover,  the  lemma  provides  an  analogous  approximation  BP  to  A  when  the  exact  rank 
of  A  is  not  r,  but  the  (r  +  l)st  singular  value  of  A  is  nevertheless  small.  We  refer  to  B  as 
a  column  skeleton  matrix  and  to  P  as  an  interpolation  matrix.  We  refer  to  the  expression 
BP  =  A  as  an  interpolative  decomposition  of  A.  The  lemma  can  be  found,  in  a  slightly 
different  form,  in  [11],  [2],  and  [8]. 

Lemma  3.1  Suppose  that  m  and  n  are  positive  integers,  and  A  is  a  complex  mx  n  matrix. 

Then,  for  any  positive  integer  r  with  r  <  m  and  r  <  n,  there  exist  a  complex  rxn  matrix 
P,  and  a  complex  m  x  r  matrix  B  whose  columns  constitute  a  subset  of  the  columns  of  A, 
such  that 

1.  some  subset  of  the  columns  of  P  makes  up  the  r  x  r  identity  matrix, 

2.  no  entry  of  P  has  an  absolute  value  greater  than  2, 

3.  ||P||  <  y/4 r(n  -  r)  +  1, 

4-  the  least  (that  is,  rth  greatest)  singular  value  of  P  is  at  least  1, 

5.  BP  =  A  when  r  =  m  or  r  =  n,  and 
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6.  \\BP  —  A\\  <  y/4F(  n  —  r)  +  1  <7r+i  when  r  <  m  and  r  <  n,  where  <rr+1  is  the  ( r  +  l)st 
greatest  singular  value  of  A. 

Remark  3.2  In  this  paper,  we  use  the  numerical  scheme  for  the  computation  of  the  matrix 
P  described  in  [2].  The  scheme  is  stable  and  requires  0(rmn )  floating  point  operations  and 
(D(mn )  floating  point  words  of  memory.  The  reader  is  referred  to  [11],  [2],  and  [8]  for  a  more 
detailed  description  of  matrix  skeletonization  and  related  techniques. 


3.2  Special  functions 

3.2.1  Fourier  series 

The  following  information  concerning  discrete  Fourier  transforms  on  equispaced  nodes  can 
be  found,  for  example,  in  [3].  We  also  describe  discrete  Fourier  transforms  for  nonequispaced 
nodes,  referring  the  reader  to  [4]  and  [5]  for  more  information. 

The  functions  emx,  with  integer  n,  are  orthogonal  on  the  interval  [ — 7r,  i r],  that  is,  for  any 
integers  m  and  n  such  that  m  ^  n, 

["  eimxeinx  dx  =  0.  (3.2) 


The  functions  emx,  with  integer  n ,  form  a  basis  for  L2[— n,  i r]  so  that  for  any  square  integrable 
function  /  :  [ — 7r,  tx]  — >  C,  there  exist  real  numbers  Co,  c i,  C2, . . .  such  that 

OO 

m  =  E  cP’-  (3-3) 

j=-oo 

In  numerical  applications,  the  series  in  (3.3)  is  truncated  after  n  terms,  for  some  appropriately 
chosen  integer  n.  The  function  /  is  thus  approximated  by  the  trigonometric  polynomial 
p  :  [— 7r,  7 r]  — >  oo,  given  by  the  formula 

n 

f(x )  «  p(x)  =  C3el3X-  (3-4) 

j——n 


Formula  (9.2.4)  in  [3]  states  that  the  complex  number  Cj  in  equation  (3.4)  is  given  by  the 
formula 

1  2  n 

ci  =  ^rTj'Ep^)e~iixt-  (3-5) 

1  k= 0 

where  the  real  number  Xk  is  defined  via  the  formula 


Xk 


2nk 
2n+  1’ 


(3.6) 


for  integers  j,  k  such  that  —  n  <  j  <  n  and  0  <  k  <  2 n. 

The  discrete  Fourier  transform  maps  the  2n  +  1  values  p(xk)  of  the  function  p  at  the 
nodes  Xk  to  the  2n  +  1  coefficients  Cj  in  (3.4),  for  —  n  <  j  <  n  and  0  <  k  <  2 n.  The  inverse 
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discrete  Fourier  transform  maps  the  2n  + 1  coefficients  Cj  in  (3.4)  to  the  2n+ 1  values  p(xk)  of 
the  function  p  at  the  nodes  Xk,  for  0  <  k  <  2n  and  —  n  <  j  <  n.  The  fast  Fourier  transform 
(FFT)  algorithm  for  computing  the  forward  and  inverse  discrete  Fourier  transforms  is  widely 
known  (see,  for  example,  [3]).  Note  that  the  FFT  and  equation  (3.5)  rely  on  the  facts  that 
x0,  Xi, . . . ,  X2n- 1,  x2 n  are  equispaced  and  j  is  an  integer  between  —  n  and  n. 

The  algorithm  described  in  Section  5  can  be  used  to  evaluate  sums  of  the  form 


2  n 

a,  =  e-"«, 


k= 0 


(3.7) 


where  x0,  X\, . . . ,  x2n-i,  x2n  are  arbitrary  real  numbers  between  0  and  27t  and  o*0,  uq,.  . . , 
0*2,1- 1,  x>2 n  are  arbitrary  real  numbers  between  —  n  and  n.  If  Xq,Xi,  . . .  ,x2n_i,x2n  are  non- 
equispaced,  we  refer  to  the  sum  in  equation  (3.7)  as  a  discrete  Fourier  transform  for  nonequi- 
spaced  nodes.  Calculation  of  a  Fourier  transform  for  nonequispaced  nodes  requires  that  we 
apply  to  the  vector  (p(xo) , p(xi) , . . .  ,p(x 2n-i),  ( x2n))J  the  matrix  Tp  defined  by  the  formula 


/  g  — ioJQXO  .  .  .  g  —  iu>0X2n  \ 


= 


\e 


—  lUJ2nXQ 


o  ^J^2n3'2n 


(3.8) 


Remark  3.3  If  the  nodes  x0,xi, . . .  ,x2n_i,x2n  are  equispaced  the  use  of  the  FFT  to  calcu¬ 
late  the  forward  and  inverse  Fourier  transforms  is  faster  than  the  algorithm  of  the  present 
paper.  If  the  nodes  xo,x\, ,  x2n-i,x2n  are  not  equispaced,  the  forward  and  inverse  Fourier 
transforms  can  be  calculated  via  the  application  of  the  matrix  Tp  defined  in  equation  (3.8); 
this  is  accelerated  by  the  algorithm  described  in  Section  5.  However,  the  algorithms  of  [4] 
and  [5]  are  faster  than  the  algorithm  of  this  paper  for  computing  nonequispaced  Fourier 
transforms.  Unlike  the  methods  of  [4]  and  [5]  the  method  of  this  paper  works  for  many 
transforms  besides  the  Fourier  transform. 


3.2.2  Bessel  functions 

Following  the  standard  practice,  we  will  be  denoting  by  Jm  the  Bessel  function  of  the  first 
kind  of  order  m  and  by  H^j  the  Ha.nkel  function  of  the  first  kind  of  order  m.  Whenever  m 
is  an  integer,  Jm  is  analytic  in  the  whole  complex  plane,  and  Hrn  has  a  singularity  at  0  and 
a  branch  cut  along  the  negative  real  axis.  The  properties  of  Bessel  functions  are  extremely 
well-known,  and  the  reader  is  referred  (for  example)  to  [1]  and  [25]. 

The  algorithm  of  Section  5  accelerates  the  evaluation  of  the  Fourier-Bessel  transform 
as  described  in  Sections  3.2.3  and  3.2.4.  In  addition,  the  algorithm  described  in  Section  5 
accelerates  the  evaluation  of  sums  of  Bessel  and  Ha.nkel  functions  over  varying  orders.  These 
sums  are  analogous  to  expansions  in  orthogonal  polynomials  (see,  for  example,  [24]  and 
[23]).  For  any  non-negative  real  number  x  and  complex  numbers  aq,.  ■  •  ,am-2,  am- i,  we 
consider  the  sums 

m—  1 

g(x )  =  ^akJk(x), 

k= o 
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(3.9) 


and 


m—  1 

w(x)  =  y~'j  akHjp(x),  (3.10) 

k= 0 

To  evaluate  the  function  g,  defined  in  (3.9),  at  the  real  points  xi,x2, . . .  ,  we  must 

apply  to  the  vector  (a0,  aq, . . . ,  am_ 2,  am_i)T  the  matrix  Ej  defined  via  the  formula 


Ej  = 


fUxi)  Jm—l{pCl)^ 

\MXm)  '  ‘  ‘  J 


(3.11) 


To  evaluate  the  function  w,  defined  in  (3.10),  at  the  real  points  aq,  x2, . . . ,  xm_i,  xm  we  must 
apply  to  the  vector  (a0,  aq, . . . ,  am_2,  am_ i)T  the  matrix  defined  via  the  formula 


Uo'W)  •"  Hi !h(*m)/ 


(3.12) 


The  algorithm  described  in  Section  5  accelerates  the  application  of  the  matrices  Ej  and  Ejp 
dehned  in  (3.11)  and  (3.12)  respectively. 

In  this  paper,  we  will  need  two  identities  connecting  Bessel  functions  with  Chebyshev 
polynomials.  Equation  (3.13)  is  a  reformulation  of  Formula  7.355.1  in  [7]  and  equation  (3.14) 
is  a  reformulation  of  Formula  7.355.2  in  [7]. 

Lemma  3.4  For  any  non-negative  integer  k  and  positive  real  number  y, 

(-1)  %J^(y)  =  faT^§^ds  (3.13) 

and 

(-1  (3.14) 


3.2.3  The  Fourier-Bessel  transform 

We  consider  the  two  dimensional  Fourier  transform  g  of  the  function  g,  dehned  via  the 
formula 


'•OO  POO 


g(x,y)e-ix^-imdxdy.  (3.15) 

We  next  express  g  in  polar  coordinates,  obtaining  the  function  7  dehned  via  the  formula 


—  OO  J  —OO 


7  (P,r)  =  g(£,v),  (3.16) 

where 

£  =  pcos(r)  (3.17) 

and 

g  =  psin(r),  (3.18) 
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for  any  non-negative  real  number  p  and  any  real  number  r  such  that  —  n  <  r  <  n.  For  each 
fixed  non- negative  real  number  p,  we  consider  the  Fourier  series  of  the  function  y(p,  r), 

OO 

7(p,r)=  £  (3.19) 

m=— oo 

where  the  Fourier  coefficient  7 m(p)  is  given  by  the  formula 

lM  =  ^  f  l(P,T)e-‘mT  dT,  (3.20) 

for  every  integer  m.  We  next  express  g  in  polar  coordinates,  obtaining  the  function  /  defined 
via  the  formula 

f(r,t)=g(x,y),  (3.21) 

where 

x  =  rcos(t)  (3.22) 

and 

p  =  rsin(t),  (3.23) 

for  any  non-negative  real  number  r  and  any  real  number  t  such  that  —  n  <  t  <  tt.  It  is  well 

known  (see,  for  example,  page  137  in  [18])  that  the  Fourier  coefficient  7 m(p)  defined  in  (3.20) 
satisfies 

/•OO 

7 m(p)  =  {-i)m  rJm{rp)fm{r)  dr,  (3.24) 

Jo 

where  Jm  is  the  Bessel  function  of  the  first  kind  of  order  m  and  fm(r )  is  the  mth  Fourier 
coefficient  of  the  function  f(r,t)  given  by  the  formula 

Ur)  =  ^  dt,  (3.25) 

for  every  integer  m.  The  function  7 m{p)  defined  in  (3.24)  is  known  as  the  Fourier-Bessel 
transform  of  /m(r). 

3.2.4  Fourier-Bessel  series 

In  this  section,  we  continue  to  use  the  notation  of  Section  3.2.3.  We  now  suppose  that  m  is 
a  non-negative  integer  and  that  the  functions  /  and  g  are  zero  outside  of  the  disk  of  radius 
R  centered  at  the  origin,  where  R  is  a  positive  real  number. 

We  define  the  real  function  Jm  :  [0,  00)  — >  R  via  the  formula 

Jm(p)  =  Jm(2npR).  (3.26) 

We  define  the  positive  real  numbers  0  <  pi  <  p^  <  p^  . . .  to  be  the  zeros  of  the  function  Jm 
defined  in  (3.26),  that  is 

JmiPk )  =  0.  (3.27) 
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We  define  the  functions  Jm,\,  Jm, 2,  Jm, 3,  •  •  •  on  the  interval  [0,  R]  via  the  formula 

V2r  Jm(2npkr ) 


R  J m+ 1  (27Tpfc-R) 


(3.28) 


for  any  positive  integer  k  and  non- negative  integer  m.  The  functions  Jm,  1 ,  Jm, 2,  •  •  •  are 

orthonormal,  that  is  for  any  positive  integers  j  and  k  such  that  j  7^  k, 

R 

Jm,j(r )  Jm,k(r)  dr  =  0  (3.29) 

and 

(Jm,j{r))2  dr  =  1.  (3.30) 

The  functions  Jm,i,  Jm,2,  Jm,3,  ■  ■  ■  are  dense  in  L2[0,R];  consequently  there  exist  real 
numbers  (3^,  (3^,  (3^, . . .  such  that 


Vrfm(r)  = 

k=i 


(3.31) 


for  any  real  number  r  such  that  0  <  r  <  R,  where  fm(r )  is  defined  in  (3.25).  The  sum  on  the 
right-hand  side  of  equation  (3.31)  is  known  as  a  Fourier-Bessel  series.  It  follows  from  (3.29) 
and  (3.30)  that  the  coefficient  (3^  in  equation  (3.31)  is  the  inner  product  of  Jm.k(j')  and 
y/rfm{r),  that  is, 

Prn=  [  Vr  Jm,k(r)fm(r)dr,  (3.32) 

Jo 

for  any  non-negative  integer  m  and  positive  integer  k.  The  numbers  Pm,  Pmi  Pmi  ■  ■  ■  defined 
in  (3.32)  are  known  as  Fourier-Bessel  coefficients.  It  follows  from  (3.32)  and  (3.28)  that 


3k 

r'm 


V2 


"R 


RJm+ii^npkR)  J 0 


rJm(27Tpkr)fm(r)  dr. 


(3.33) 


It  follows  from  (3.33)  and  the  fact  that  /  is  zero  outside  the  disk  of  radius  R  centered  at 
the  origin  that  the  Fourier-Bessel  coefficients  Pm,  Pm,  Pm,  ■  ■  ■  are  a  discretized  version  of  the 
Fourier-Bessel  transform  7m(p)  defined  in  equation  (3.24),  that  is, 


Pk 


V2 

R  Jm+liP'K PkR') 


im7m(2vrpfc). 


(3.34) 


Remark  3.5  The  integrand  rJm{2npkr)fm(r )  in  equation  (3.33)  has  infinitely  many  contin¬ 
uous  derivatives,  provided  that  the  function  fm(r)  has  infinitely  many  continuous  derivatives. 
We  use  the  Gaussian  quadrature  based  on  the  nodes  of  Legendre  polynomials  to  approximate 
the  integral  in  (3.33)  accurately.  Specifically,  we  use  the  nodes  yj  on  the  interval  [0,  R]  and 
corresponding  weights  Wj,  defined  in  Formula  25.4.30  in  [1], 
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Remark  3.6  To  ensure  that  we  sample  at  the  Nyquist  rate  or  higher,  we  discretize  at  slightly 
greater  than  two  nodes  per  wavelength,  using  n  nodes  in  the  calculation  of  the  Fourier-Bessel 
coefficients  Plm,  /3^,  . . . ,  Pm2~m~c~l ,  Pm2~m~C ,  where  m  is  the  order  of  the  Fourier-Bessel 
transform.  C  =  10  is  sufficient  when  the  precision  £  of  the  computations  is  1CF10  and  n  >  20. 
In  fact,  C  depends  weakly  on  e;  it  is  easily  shown  that  C  must  be  of  the  order  log(l/e). 


To  calculate  the  Fourier-Bessel  coefficients,  it  is  necessary  to  evaluate  the  integral  in  (3.33). 
As  described  in  Remark  3.5,  we  evaluate  the  integral  in  (3.33)  by  using  the  Gaussian  quadra¬ 
ture  with  nodes  y-j  and  weights  Wj.  We  truncate  the  series  in  equation  (3.31)  after  n/2—m—C 
terms,  where  n  is  the  number  of  nodes  y3  used  to  discretize  the  integral  in  equation  (3.33), 
m  is  the  order  of  the  Fourier-Bessel  transform,  and  C  is  as  described  in  Remark  3.6.  The 
function  \frfm{r)  is  thus  approximated  by  the  function  p  :  [0,  R]  — >  R  given  by  the  formula 

n/2—m—C 

Vrfm(r)  ~  p(r)  =  Y  PtJm,k(r)-  (3-35) 

k= 1 


We  dehne  the  Fourier-Bessel  series  transform  :  Rn  — >  Rra//2  m  c  of  order  m  and  size  n 
via  the  formula 


U%i(fm(yi),  fm(y 2),  •  •  •  ,  fm{yn- 1),  fm(yn))T  =  {PL  PL  •  •  •  »  P; 


j n/2—m—C—l  pn/2—m—C ^ T 


(3.36) 


For  every  integer  k  such  that  1  <  k  <  n/2  —  m  —  C,  we  dehne  the  real  number  7^  to  be 
the  approximation  of  the  real  number  P ^  given  in  (3.33)  obtained  by  using  the  Gaussian 
quadrature  with  nodes  yj  on  the  interval  [0,  R]  and  corresponding  weights  Wj.  That  is, 


or 

where 


7fc  = 

lm 


V2 


R  Jm+i(2npkR)  ^  w^J^Pkyj)fm(yj 


P*  7  =  U^fm: 


T  jn  nn  rpn  Tj/n 

^  m  ^m1  mvv  mi 

a  _  ( ol  o2  nn/2—m—C—l  nn/2—m—C\^ 

P  \Pm’>  Pm ?  *  *  *  5  Pm  1  Pm  )  i 

/v  =  (V  -y2  n/2-m-C-l  n/2—m—C\T 

I  \  lm ?  / m  1  *  *  *  1  I m  i  lm  )  ? 

fm  “  (/m(?/l))  fm{y 2)5  •  •  •  ?  fm{Vn—  l)?  fm{Vn))  5 

/ 


on  _ 


Jm+l(27rpii?) 


0 


Jm+l(2lTp2R) 


\ 


V  0 


rpn  _ 


Jm+ 1  P11  Pn/2—m  —  C  R) ) 

Jmp2ixpiyn)  ^ 


Pn/2—m—C  2/l )  '  '  '  Jm{2‘Kpn/2—m—C  yn) J 


(3.37) 

(3.38) 

(3.39) 

(3.40) 

(3.41) 

(3.42) 

(3.43) 


(3.44) 
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and 


fmyi 

o  \ 

wn  = 

w2y2 

\  0 

wnyn  J 

(3.45) 


The  algorithm  described  in  Section  5  accelerates  the  application  of  the  matrix  T™  in  (3.44) 
and  thus  accelerates  the  evaluation  of  the  Fourier- Bessel  series  transform. 

We  define  the  inverse  Fourier-Bessel  series  transform  Q ^  :  ]Rn/2-m-c  _>  ]Rn  Qf  order  m 
and  size  n  via  the  formula 


On  (B1  B2  6' 

Ko6m\hJm’>  rJm’>  *  *  *  ? 


n/2—m—C—l 


A 


n/2—m—C\T  _ 
m  ) 


p(yi)  p(y 2)  p(yn-i)  p(yn) 

y/lJl  y/V2  y/Vn—  1  y/Vn 


(3.46) 


where  PY  P^, . . . ,  /C/2  m  °  1 ,  pY,2  m  C  andp  satisfy  equation  (3.35),  and  y1:  y2, . . . ,  yn-i,  Un 
are  real  numbers.  It  follows  from  (3.35)  and  (3.46)  that 


fm  ~  QmP,  (3-47) 

where  fm  is  the  vector  defined  in  (3.42),  the  vector  (3  is  defined  in  equation  (3.40),  and  Q ^  is 
the  inverse  Fourier-Bessel  series  transform  defined  in  (3.46).  It  follows  from  (3.28)  and  (3.35) 
that,  in  matrix  notation,  equation  (3.46)  becomes 

g”/J=^r„”TS”A  (3.48) 

where  P  is  the  vector  defined  in  (3.40),  the  matrix  5"  is  defined  in  (3.43),  and  T™  is  the 
matrix  defined  in  (3.44). 


Remark  3.7  The  matrix  Q ^  defined  in  (3.46)  is  the  right  inverse  of  defined  in  (3.39), 
that  is 

ICQ"  =  1,  (3.49) 

where  1  is  the  (n/2  —  m  —  C)  x  (n/2  —  m  —  C)  identity  matrix.  Indeed,  suppose  that  /3^, 
PY  . . . ,  Pm2  m  °  1,  Pm.2  m  C  are  real  numbers  and  that  fm  is  the  real  function  defined  via 
the  formula 

n/2—m—C 

Vyfm(y)  =  faJmM-  (3-50) 

k=  1 

It  follows  from  (3.50)  and  the  definition  of  the  function  p  in  (3.35)  that 

Vyfm(y)=p(y)-  (3.5i) 

It  follows  from  (3.46)  and  (3.51)  that 

QmiPL  P‘m.1  •  •  •  )  PVY  =  {fmiVl),  fm{y 2),  •  •  •  ,  fmiVn- 1),  fm(yn)Y •  (3.52) 

Finally,  (3.49)  follows  from  (3.36)  and  (3.52). 
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4  Analytical  apparatus 

The  algorithm  of  this  paper  relies  on  the  observation  that  for  certain  n  x  n  matrices  the 
rank  to  precision  e  of  any  p  x  q  contiguous  submatrix  is  proportional  to  pq/n.  The  present 
section  contains  a  proof  of  this  fact  for  the  the  Fourier- Bessel  transform  (see  Theorem  4.3, 
below).  The  principal  tool  used  in  the  proof  of  Theorem  4.3  is  Lemma  4.1. 

The  following  lemma  states  that  the  rank  of  the  normalized  Fourier  transform  with 
kernel  e*7^r/4  is  bounded  by  a  constant  times  7,  at  any  fixed  precision  e.  This  lemma  can  be 
found  (in  a  slightly  different  form)  in  [10]. 


Lemma  4.1  Suppose  that  6,  e,  and  7  are  positive  real  numbers  such  that 


0  <  e  <  1. 


(4.1) 


Suppose  further  that  the  operator  F  :  L2[— 1,1]  — »  L2[— 1,1] 


oo  yiutm  uy 


{Fh)(r)  =  J  d£. 

Then,  F  has  rank  to  precision  e  at  most 


where 


W=<1+*>l£  +  f 


+  3, 


E  =  2 


\ 


2  In  (-)  In 

s' 


(4.2) 


(4.3) 


(4.4) 


Remark  4.2  If  the  Fourier  transform  with  kernel  elxt  is  restricted  to  a  rectangle  in  the  (x,  t ) 
plane  then  its  rank  is  bounded  by  a  constant  times  the  area  of  the  rectangle. 

Indeed,  suppose  that  the  operator  A  :  L2[a,  b]  — >  L2[u,v]  is  given  by  the  formula 


(Ag)(t)=  f  elxtg{x)dx. 

J  a 

(4.5) 

Defining 

t  2(z  -  a) 
s  b  —  a 

(4.6) 

_  .  2(/  -  u)  1 
v  —  u 

(4.7) 

and 

MO  =  9(x) 

(4.8) 

yields  that  the  operator  A  has  the  same  rank  as  the  operator  F  :  Z/2  [ — 1, 1]  — >  Z/2  [ — 1, 1] 
defined  in  (4.2),  with  kernel  e*7^T//4,  where  7  =  (b  —  a)(v  —  u)  is  the  area  of  the  rectangle 
[a,  b]  x  [u,v\  in  the  (x,t)  plane.  It  then  follows  from  Lemma  4.1  that  the  operator  A  defined 
in  (4.5)  has  rank  at  most  N,  where  N  is  defined  in  (4.3). 
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The  following  theorem  states  that  if  the  Fourier-Bessel  transform  with  kernel  x  Jm(xt) 
(see  equation  (3.24))  is  restricted  to  a  rectangle  in  the  (x,t)  plane,  its  rank  at  any  fixed 
precision  £  is  bounded  by  a  constant  times  the  area  of  the  rectangle. 


Theorem  4.3  Suppose  that  a,  b,  R,  u,  v,  S,  and  e  are  real  numbers  such  that  R,  6,  e  >  0 
and  u  <  v,  with  0  <  a  <  b  <  R.  Suppose  further  that  m  is  a  non-negative  integer  and 
Um  :  L2[a,  b]  — >  L2[u,v]  is  the  integral  operator  given  by  the  formula 

(Umf)(t)  =  (~i)m  [  xJm(xt)f(x)dx.  (4.9) 

J  a 

Then,  the  rank  of  Um  to  precision  Rf2eji\  is  at  most 


M  =  2(1  +  <5) 


(b-a){v-u)  E' 

- 2tt - +  7I+6’ 


(4.10) 


where  E  is  the  real  number  given  by  the  formula 


E 


1 


2  In  (-)  In 


7?  +  ^ 


Proof. 

We  prove  the  theorem  in  the  case  where 

m  =  2k, 


(4.11) 


(4.12) 


for  some  non- negative  integer  k.  The  proof  when  m  =  2k  +  1,  for  some  non- negative  integer 
k  is  similar. 

We  start  by  combining  (4.9)  and  (3.14)  to  obtain 


or 


( U2kf)(t )  =  -  /  xf(x) 
7T  .  L 


( U2kf)(t )  =  -  xf(x) 


T2k(s)  cos (xts) 
)  V7!  -  S2 


ds,  )  dx 


IT 


T2k(s)(eixts  +  e~ixts) 
)  a/1  -  s2 


ds  dx. 


Introducing  the  notation 

with 

we  rewrite  (4.14)  in  the  form 


a 


x(£)  ~  ^(6  +  1)  +  a, 


a  =  b  —  a, 


a 


(u2kf)(t)  =  -  J  *(0M0 


T2k(s)  c(£,t,s) 

vT ^ 


ds  df, 


(4.13) 

(4.14) 

(4.15) 

(4.16) 

(4.17) 
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where 


and 


f  ^  _  gitsa£/2-\-itsa/2-\-itsa  _|_  ^—itsa^/2—itsa/2—itsa 

MO  =  f{x). 


Now,  introducing  the  notation 


r  = 


(3  —  v  —  u, 

7  =  aft, 

2  (t  —  u) 


P 


-  1, 


^(T  s  =  eia/3sT/2ei'ysT/4:eiasueiasu/2ei'ys/4:eipas/2eiasu£/2ei'ys€/4 

we  observe  that  the  function  c  defined  in  (4.18)  assumes  the  form 

c(£,  t, s)  =  n(r,  s,  ()  +  e-1^1  ,(r,  s,  fl"1 

and  that 


(V2kh)(r)  =  ( U2kf)(t ), 


where 


(V2fc/i)(r)  = 

—  f1  x(€)h{£)(  f1  T2k{s)  (  ei7TS?/4'^r’ s’  0  +  s, 0"1) 


(4.18) 

(4.19) 

(4.20) 

(4.21) 

(4.22) 

(4.23) 

(4.24) 

(4.25) 

(4.26) 


1-1  \J  0  a/1  —  s2 


ds  I  d£. 


Changing  the  order  of  integration  in  (4.26),  we  rewrite  it  in  the  form 

(V2kh)(r)  =  (X2kh)(r)  +  ( Y2kh)(r ), 

with 

(X2 kh){r)  =  ^-f  T*k<yS\  [  enTS^/4h(C)x(^)ri(T,s,^)d(ds 


and 


(Y2kh)(r)  = 


2vr  J o  i/l  -  s2  J- i 
a  f1  T2k(s) 


e-^WhiOxiOvfas^dtds. 


2vr./o  Vl  -  s2  J- 1 
Finally,  defining  the  operators  71^,  and  C2fc  via  the  formulas 

(A2kg){r)  =  ^  -^==p(s,r)ds, 


a 


(B2kh)(s:  r)  =  -—  j  ei'yT^/4h(^)x(^)rj{r,s,^)d^, 


1  - i 


and 


(C2kh)(s,T)  =  ^  J  e  llTS^/4h(C)x(0v(T,s,C)  1  df, 


(4.27) 

(4.28) 

(4.29) 

(4.30) 

(4.31) 

(4.32) 
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we  observe  that  X2k  is  the  composition  of  A2k  with  B2k,  that  is, 

X2 k  =  A2k  o  B2k  (4.33) 

and  that  Y2k  is  composition  of  A2k  with  C2k,  that  is, 

I2 k  —  A2k  °  C2k-  (4.34) 

Due  to  Lemma  4.1  and  the  facts  that  0  <  x(£)  <  R,  \r)(r,  s,  £)|  =  1  >  and  0  <  a  <  R  the 
ranks  of  the  operators  B2k  and  C2k  are  bounded  by  N  to  precision  R2e/( 2tt),  where  N  is 
dehned  in  equation  (4.3).  Therefore  (4.27),  (4.33),  (4.34),  and  (4.30)  yield  that  the  rank  of 
V2 k  is  bounded  by  2 N  to  precision  R2e/7t. 

□ 


5  The  butterfly  algorithm 

This  section  contains  a  description  of  an  algorithm  for  the  application  of  n  x  n  matrices 
which  have  the  property  that  any  p  x  q  contiguous  submatrix  has  rank  to  precision  e  at  most 
a  constant  times  pq/n. 

5.1  Informal  description  of  the  algorithm 

We  now  illustrate  the  algorithm  in  the  particularly  simple  case  of  the  Fourier  transform  of 
size  n  —  2m.  We  define  the  function  /  via  the  formula 

n 

f(x)  =  Y.a^iX-  (5-1) 

k= 1 


where  aq,  a2,  ...,  an_i,  an  E  C  and  the  frequencies  oq,  co2,  ...,  u!n-i,  un  E  [0,  27t]  are 
equispaced.  Suppose  that  we  would  like  to  evaluate  the  function  /  at  n  equispaced  nodes 
Xi,  x2,  . . . ,  xn_i,  xn  E  [a,  6],  that  is  we  would  like  to  apply  to  the  vector  (aq,  a2,  . . . ,  an_i, 
an)T  the  n  x  n  matrix  S  dehned  via  the  formula 


S  = 


/  gtvlxl 

gitD  2*^1 

^VjJn  —  lX\ 

glUJnX  1  \ 

g%UJiX2 

glUJ2X2 

^iuJn  —  1  X2 

giiJnX2 

giuiXn-1 

glUJ2Xn-l 

^iuJn—lXn  —  1 

^iuJnXn  —  1 

g iW2Xn 

giuJn—iXn 

^iiJnXn  y 

(5.2) 


For  any  pair  of  subintervals  fl  C  [0,  27r]  and  X  C  [a,  b]  we  define  S(Q,  X)  to  be  the  submatrix 
of  S  given  by  the  intersection  of  those  columns  corresponding  to  Uk  E  D  and  those  rows 
corresponding  to  Xk  E  X. 

In  the  precomputation  stage  of  the  present  algorithm,  we  compress  the  matrix  S.  This 
allows  us,  in  the  application  stage,  to  evaluate  /  dehned  in  (5.1)  at  the  nodes  aq,  x2,  . . . , 
xn_i,  xn  in  0{n  log(n))  operations. 
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PRECOMPUTATION 


Level  0  On  level  0,  we  split  the  interval  [0,  2n]  into  2L  subintervals  of  length  27t/(2l). 
Specifically,  we  define  the  interval  Oo,fc  via  the  formula 

n0,k  =  [27r{k-l)2-L,27rk2-Ll  (5.3) 

for  every  integer  k  such  that  1  <  k  <  2L .  We  observe  that,  due  to  Remark  4.2,  if 

L  =  log2(fe  —  a)  (5.4) 

then  the  matrices  S(Qojk,  [a,  &])  have  constant  rank;  we  will  be  referring  to  this  rank  as  r,  so 
that 

r  =  0(1).  (5.5) 

The  matrices  S(D,q >k,  [a,  b})  are  illustrated  in  Figure  5.  We  compute  an  interpolative  de- 
compostion  (see  Lemma  3.1)  of  every  matrix  S(Qotk,[a,b]).  That  is,  for  every  matrix 
S{^o,k,  [o,  b]),  we  compute  a  column  skeleton  matrix  B$)k  which  contains  ~  r  columns  of 
[a,  b])  and  an  interpolation  matrix  P0,fc  which  contains  coefficients  expressing  every 
column  of  S(Qo,k,  la,  b])  as  a  linear  combination  of  the  columns  of  Ba  k,  that  is, 

S(n0,k,  [a,  bj)  =  B0tkp0jk,  (5.6) 


for  1  <  k  <  2L . 


Figure  5:  Level  0 


Remark  5.1  The  number  of  levels  L  defined  in  (5.4)  is  of  the  order  log2(n),  where  n  is  the 
number  of  nodes.  Indeed,  the  number  of  nodes  n  is  proportional  to  the  length  b  —  a  of  the 
interval  [a,  b\. 

Level  1  On  Level  1,  we  split  the  interval  [0,  2n\  into  2L~1  subintervals  of  length  27t/(2l~1). 
Each  of  the  subintervals  on  Level  1  is  obtained  by  merging  two  neighboring  intervals  on 
Level  0.  Specifically,  we  define  the  interval  fl\)k  via  the  formula 

nltk  =  [2vr  (k  -  l)2-(i-1),2vrfc2-(L-1)],  (5.7) 
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for  every  integer  k  such  that  1  <  k  <  2L~1.  Splitting  the  interval  [a,b]  in  two,  we  denote 
by  X2,i  the  first  half  of  [a,b]  and  by  X2^  the  second  half  of  [a,  b\.  As  described  in  Ob¬ 
servation  5.2  the  matrices  S(flitk,  A"i,y)  all  have  rank  ~  r.  The  matrices  S(Qitk,  Xi  J)  are 
illustrated  in  Figure  6.  We  compute  an  interpolative  decomposition  (see  Lemma  3.1)  of  each 
matrix  S(tti^,  X1:j)  on  Level  1.  Specifically,  for  each  matrix  S(Q ik,^i,j),  we  compute  a 
column  skeleton  matrix  -Bpyfc  which  contains  ~  r  columns  of  S(Qitk,  Xij);  together,  these 
columns  span  the  range  of  S(Qitk,  Xij).  Further,  we  compute  an  interpolation  matrix  P\  jj- 
containing  coefficients  which  express  halves  of  columns  in  skeleton  matrices  on  Level  0  as 
linear  combinations  of  columns  in  B Specihcally, 


2k— 1 


(5.8) 


and 

(B(),2k-1  BQ)2k)  =  Bit2,kPl,2,k,  (5-9) 

for  1  <  k  <  2i_1,  where  for  any  matrix  X,  the  top  half  of  X  is  denoted  by  X+  and  the 
bottom  half  of  X  is  denoted  by  X~ . 

The  interpolation  matrices  Pij,k  on  Level  1  all  have  size  ~  (r  x  2 r)  (see  Remark  5.3) 


Observation  5.2  All  submatrices  S(Q^k,  Xij)  on  all  levels  have  approximately  the  same 
rank,  namely  ~  r.  Indeed,  on  each  Level  l  such  that  1  <  /  <  L,  we  consider  subintervals 
Qitk  C  [0,  27t]  obtained  by  combining  two  neighboring  subintervals  on  the  previous  level  l  —  1. 
Moreover,  on  each  Level  /  such  that  1  <  l  <  L,  we  consider  subintervals  A)  j  C  [a,  b]  obtained 
by  splitting  a  subinterval  on  the  previous  level  in  half.  Therefore,  all  rectangles  on  all  levels 
have  the  same  area.  Definitions  (5.3)  and  (5.4)  yield  that  the  rectangles  Do ,k  x  [a,  b]  on 
Level  0  have  area  2n.  Due  to  Remark  4.2,  then,  all  the  matrices  S(£litk,Xij)  on  all  levels 
have  rank  0(1).  However,  in  practice,  not  all  the  matrices  S(Q^k,  Xjtk)  have  exactly  the 
same  rank;  their  ranks  are  similar  and  are  denoted  by  ~  r. 


Figure  6:  Level  1 


Remark  5.3  The  interpolation  matrices  on  Level  l  (for  1  <  l  <  L)  all  have  size  ~(rx  2 r). 
Indeed  all  submatrices  on  all  levels  have  the  same  rank  ~  r  as  the  submatrices  on  Level  0 
(see  Observation  5.2).  Each  submatrix  S(Qitk,  Xij)  on  Level  l  is  either  the  top  half  or  the 
bottom  half  of  two  adjacent  submatrices  on  Level  /  —  1;  we  refer  to  these  submatrices  on 
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Level  l  —  1  as  the  “parents”  of  S(Qi:k,  Xi  j).  For  each  of  the  ~  2 r  columns  in  the  skeleton 
matrices  corresponding  to  the  parents  of  S(Qitk,  X )  the  interpolation  matrix  Pi.3.k  on  Level  / 
contains  coefficients  expressing  that  column  as  a  linear  combination  of  the  columns  in  the 
skeleton  matrix  Bi  j 

Level  2  On  Level  2,  we  split  the  interval  [0,  27r]  into  2L~2  subintervals  of  length  2n /2L~2 . 
Each  of  the  subintervals  on  Level  2  is  obtained  by  merging  two  neighboring  intervals  on 
Level  1.  Specifically,  we  define  the  interval  O2 ,fc  via  the  formula 

=  [2vr  (fc  -  1)  2-(i-2),  2vr  k  2"(L“2)],  (5.10) 

for  every  integer  k  such  that  1  <  k  <  2L~2.  Splitting  the  interval  [a,b]  in  four,  we  define 
X2,i  to  be  the  first  quarter  of  [a,  b],  X2;2  to  be  the  second  quarter  of  [a,  6],  X2j3  to  be 
the  third  quarter  of  [a,  6],  and  X2 j4  to  be  the  fourth  quarter  of  [a,b\.  As  described  in 
Observation  5.2,  the  matrices  S(fl2:k,  X2  j)  all  have  rank  ~  r.  The  matrices  S(fl2^,  X2j) 
are  illustrated  in  Figure  7.  We  compute  an  interpolative  decomposition  (see  Lemma  3.1)  of 
each  matrix  S{£L2^,X2j)  on  Level  2.  Specifically,  for  each  matrix  S(Q2jk,  X2:3),  we  compute 
a  column  skeleton  matrix  L?2j,fc  which  contains  ~  r  columns  of  S(Q2tk,  X2j)]  together,  these 
columns  span  the  range  of  S(fl2tk,X2}j).  Further,  we  compute  an  interpolation  matrix  P2.j,k 
containing  coefficients  which  express  halves  of  columns  in  skeleton  matrices  on  Level  1  as 
linear  combinations  of  columns  of  B2j Specifically,  for  1  <  k  <  2L~2, 

{Bl,l(j+l)/2\,2k- 1  5l,L(j+l)/2j,2fc)+  =  B2,j,kP2,j,k  (5-11) 

when  j  —  1  or  j  —  3,  and 

(jBU(i+1)/2j,2fc-i  -Bi,L0'+i)/2j,2fc)  =  B2jkP2jk  (5-12) 

when  j  =  2  or  j  =  4,  where  for  any  matrix  X,  the  top  half  of  X  is  denoted  by  X+  and  the 
bottom  half  of  X  is  denoted  by  X~ .  The  interpolation  matrices  P2)j,k  on  Level  2  all  have 
size  ~  (r  x  2 r)  (see  Remark  5.3). 


Figure  7:  Level  2 

Level  L  Continuing  the  process,  we  arrive  finally  at  Level  L.  On  Level  L,  we  split  the 
interval  [a,  b\  into  2L  subintervals  of  length  ( b  —  a)/2L.  Specifically,  we  define  the  interval 
Xlj  via  the  formula 

XL,j  =  [a+  (b- a)  2~l  ( j  -  1),  a  +  (b  -  a)  2~L  j],  (5.13) 
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for  every  integer  j  such  that  1  <  j  <  2L.  As  described  in  Observation  5.2,  the  matrices 
S([0,2tt],  XLj)  all  have  rank  ~  r.  The  matrices  -S'( [0,  27t],  XLj)  are  illustrated  in  Figure  8. 
We  compute  an  interpolative  decomposition  (see  Lemma  3.1)  of  each  matrix  S([0,2tt],  XLj) 
on  Level  L.  Specifically,  for  each  matrix  SQO,  27t],  XLj),  we  compute  a  column  skeleton 
matrix  BLj  which  contains  ~  r  columns  of  ^([O,  27t],  XlJ)\  together,  these  columns  span 
the  range  of  S'([0,  2n,  Xlj).  Further,  we  compute  an  interpolation  matrix  Plj  containing 
coefficients  which  express  halves  of  columns  in  skeleton  matrices  on  Level  L  —  1  as  linear 
combinations  of  columns  of  Blj.  Specifically,  for  1  <3  <  2L, 

(£l-i,|Xri)/2J,i  BL_1A{j+1)/2i  ,2)+ =  BLjjPL,j  (5.14) 

when  j  is  odd,  and 

(-Bl-i,L(.j+i)/2J,i  -^l-i,L(j+i)/2J,2)  =  BljPlj  (5.15) 

when  j  is  even,  where  for  any  matrix  X,  the  top  half  of  X  is  denoted  by  X+  and  the 

bottom  half  of  X  is  denoted  by  X-.  Because  the  matrices  S'([0,  27t],  Xl  J)  on  Level  L  and 

the  matrices  S(flL-i,k,  Xl-ij)  on  Level  L  —  1  all  have  rank  ~  r,  the  interpolation  matrices 
PLj  on  Level  L  all  have  size  ~  (r  x  2 r)  (see  Remark  5.3). 


Figure  8:  Level  L 


APPLICATION 

The  inputs  to  this  stage  of  the  algorithm  are  n  coefficients  «i,  a2,  . . . ,  an_i,  an  and  the 
results  of  the  precomputation  described  above.  We  would  like  to  apply  the  matrix  S  defined 
in  (5.2)  to  the  vector  a  =  (an,  a2,  ...,  an-i,  «n)T-  That  is,  we  would  like  to  evaluate 
the  linear  combination  of  the  columns  of  S  with  coefficients  an,  a2,  . . . ,  an_i,  an.  This  is 
equivalent  to  evaluating  the  function  /  defined  in  (5.1)  at  each  of  the  nodes  an,  a;2,  . . . ,  xn_i, 
xn.  For  any  subinterval  C  [0,27r],  we  define  a(fl)  to  be  the  entries  of  a  corresponding  to 
the  frequencies  04.  e  fL 

Step  0  For  each  k  =  1,  2,  . . . ,  2L  —  1,  2L,  we  apply  the  interpolation  matrix  P0,fc,  defined 
in  (5.6),  to  the  vector  a(h20,fc)  obtaining  the  vector  /50,fc -  The  vector  /30,fc  consists  of  ~  r 
coefficients  representing  the  effect  at  all  the  nodes  an,  x2,  . . . ,  x„_i,  xn  G  [a,  b\  of  the  n/ 2L 
frequencies  a(f20,fc)- 
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Step  1  Applying  the  interpolation  matrices  Pi,j,k  defined  in  (5.8)  and  (5.9),  we  calculate 
the  vectors  B\.3.k  via  the  formula 


Pl,j,k 


(  A),2fc-l\ 

\  A), 2 k  )  ’ 


(5.16) 


for  each  pair  of  integers  j,  k  such  that  1  <  k  <  2L_1  and  1  <  j  <  2,  where  the  vectors 
/3o:2k  and  /3o,2fc-i  were  computed  in  Step  0.  The  vector  fi\.jyk  consists  of  ~  r  coefficients 
representing  the  effect  at  the  n/ 2  nodes  xm  G  X3j  of  the  n/ 2h~1  frequencies  a(h2l  fc). 

Step  2  Applying  the  interpolation  matrices  P2j,fc  defined  in  (5.11)  and  (5.12),  we  calcu¬ 
late  the  vectors  via  the  formula 


02  ,j,k 


(  A,L(i+l)/2j,2fc-l\ 

V  A,L(i+l)/2j,2fc  )  ’ 


(5.17) 


for  each  pair  of  integers  j  and  k  such  that  1  <  k  <  2L~2  and  1  <  j  <  4,  where  the  vectors 
AL,L(j+i)/2j,2fc-i  and  Pi,[(j+i)/2},2k  were  computed  in  Step  1.  The  vector  /?2 j,k  consisits  of  ~  r 
coefficients  representing  the  effect  at  the  n/4  nodes  xm  G  X2j  of  the  n/2L~2  frequencies 
Of(^2,fc)- 

Step  L  Continuing  the  process,  we  arrive  at  Step  L.  Applying  the  interpolation  matrices 
Plj  defined  in  (5.14)  and  (5.15),  we  calculate  the  vectors  0L.j  via  the  formula 


0L,j 


p  (  /?l-i,L0+i)/2J,A 

L'j  ’ 


(5.18) 


for  every  integer  j  such  that  1  <  j  <  2L,  where  the  vectors  /?l- i,L(j+i)/2j,i  and  /9l-i,L0'+i)/2J,2 
were  calculated  in  Step  L  —  1.  Finally,  we  apply  the  matrix  j  to  the  vector  / 3lj  obtaining 
the  product 

BLJpLJ  =  S([  0,2n\,XLJ)a.  (5.19) 

Equation  (5.19)  states  that  the  vector  represents  the  effect  at  the  n/ 2L  nodes  xm  G  XLj 
of  all  n  frequencies  ai,  a2,  . . . ,  a„_i,  an  G  [0,  27r] .  In  other  words,  linearly  combining  the 
~  r  columns  of  Blj  with  coefficients  Bl.j  yields  the  same  vector  of  length  n/ 2L  as  linearly 
combining  the  n  columns  of  S'QO,  27t],  Xlj)  with  coefficients  given  by  the  entries  of  the  vector 
a.  Thus,  BljPlj  is  the  jth  block  of  n/2L  entries  in  the  vector  Sot. 


Remark  5.4  The  algorithm  of  the  present  paper  exhibits  the  same  performance  when  ap¬ 
plied  to  the  transposed  matrix  ST ,  since  all  relevant  submatrices  of  ST  satisfy  the  requisite 
bound  on  their  ranks  (see  Remark  4.2). 


Remark  5.5  We  have  described  the  algorithm  in  the  illustrative  case  of  the  equispaced 
Fourier  transform  of  size  n  =  2m.  The  description  is  similar  for  other  sizes  of  matrices  and 
other  transorms;  it  is  therefore  omitted. 
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5.2  CPU  requirements 

5.2.1  Precomputation 

On  Level  0,  we  compute  the  2L  interpolative  decompositions  of  the  n  x  (n/ 2L)  matrices  of 
rank  ~  r  on  the  left,  hand  side  of  (5.6);  this  takes  a  total  of  0{rn 2)  operations.  It  then  follows 
from  (5.5)  that  we  require  0(n 2)  operations  on  Level  0.  On  each  Level  l,  for  1  <  l  <  L ,  we 
compute  interpolative  decompositions  of  the  (n/ 2l)x  ~  2 r  matrices 


(■Bi,L0'+l)/2j,2fc-l 

A,L(j+l)/2j,2fe)  + 

(5.20) 

(■Bi,L(j'+l)/2j,2fc-l 

A,LL+i)/2j,2fe)  ; 

(5.21) 

these  matrices  have  rank  r.  There  are  2L  such  matrices  on  each  level.  In  total  this  requires 
of  the  order 

(5.22) 

/=0 

operations.  It  follows  from  (5.5)  and  Remark  5.1  that  2 L+2r2n  =  0(n2);  the  precomputation 
therefore  takes  0{n 2)  operations. 

5.2.2  Application 

In  Step  0,  for  each  integer  k  such  that  1  <  k  <  2L,  we  calculate  2L  vectors  /30,fc  of  length  ~  r 
by  applying  the  matrix  Po,fc  of  size  ~  r  x  ( n/2L )  to  the  vector  ct(12o,fc)  of  length  n/ 2L.  For 
integers  l,  k,  and  j  such  that  1  <  l  <  L  and  1  <  k  <  2L~l,  with  1  <  j  <  2l,  we  compute  the 
vector  /3ijyk  of  length  ~  r  by  applying  the  ~  (r  x  2r)  matrix  Pijjk  to  a  vector  of  length  ~  2 r. 
Finally,  on  level  L,  for  each  integer  j  such  that  1  <  j  <  2L,  we  apply  the  column  skeleton 
matrix  Bkj  (see  (5.19))  having  size  (n/ 2L)x  ~  r  to  the  vector  /3lj  of  length  ~  r.  In  total, 
the  time  taken  to  apply  the  matrix  S  to  an  arbitrary  vector  is  0(rn  +  r2L2L).  Combining 
Remark  5.1  and  (5.5)  then  yields  that  we  require  0(rn  +  r2L2L)  =  0(n\og(n))  operations 
to  apply  the  matrix  S  to  an  arbitrary  vector  a. 

5.3  Memory  requirements 

5.3.1  Precomputation 

On  Level  0,  we  store  2L  interpolation  matrices  each  having  size  ~  r  x  (n/2L).  On  each 
Level  l,  for  1  <  l  <  L ,  we  store  2L  interpolation  matrices,  each  having  size  ~  (r  x  2 r)  (see 
Remark  5.3).  On  Level  L ,  we  store  an  additional  2L  column  skeleton  matrices,  each  having 
size  (n/2L)  x  ~  r.  The  total  memory  requirement  for  the  precomputation  is  therefore  0(rn+ 
L2Lr2).  Combining  Remark  5.1  and  (5.5)  then  yields  that  the  total  memory  requirement  for 
the  precomputation  is  0(rn  +  L2Lr 2)  =  0(nlog(n)). 
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5.3.2  Application 

During  the  application  stage  of  the  present  algorithm,  the  interpolation  matrices  Pij,k  (f°r 
0  <  l  <  L)  and  the  column  skeleton  matrices  Blj  on  Level  L  must  be  kept  in  memory; 
this  requires  0(nlog(n))  memory,  as  described  in  Section  5.3.1.  In  addition,  in  Step  0,  we 
store  2l  vectors  /30,fc  of  length  ~  r.  On  each  Level  l  such  that  1  <  l  <  l,  we  store  2L  vectors 
Bi-j.k  (see  (5.16),  (5.17),  and  (5.18))  of  length  ~  r.  This  requires  0[rL2L )  memory.  Com¬ 
bining  (5.5)  with  Remark  5.1  then  yields  that  the  memory  requirement  for  the  application 
stage  is  0{n\og{n)). 

5.4  Detailed  description  of  the  algorithm 

This  section  contains  a  detailed  description  of  the  algorithm  that  was  described  informally 
in  Section  5.1.  Given  an  integer  n  =  2m  and  an  n  x  n  matrix  S,  such  that  any  p  x 
q  contiguous  submatrix  of  S  has  rank  bounded  by  a  constant  times  pq/n,  we  compute 
interpolative  decompositions  of  submatrices  of  S.  We  then  apply  S  to  an  arbitrary  vector  a 
rapidly,  yielding  f  =  Sot. 

In  this  section  we  denote  by  the  kth  block  of  n/ 2L  entries  in  a.  Similarly,  we  denote 
by  f L,k  the  kth  block  of  n/ 2L  entries  in  /. 

Initialization  Step 

Choose  principal  parameters  and  create  dyadic  hierarchy 

1.  Choose  a  positive  real  number  e.  All  interpolative  decompositions  in  this  algorithm 
are  computed  to  precision  e. 

2.  Choose  Cmax,  the  number  of  columns  in  each  submatrix  on  Level  0. 

Comment  [In  what  follows,  we  assume  that  the  value  of  G'max  chosen  above  is  a 
positive  integer  power  of  2.  If  this  is  not  the  case,  the  algorithm  is  similar  and  its 
description  is  therefore  omitted.] 

3.  Choose  the  number  of  levels  L  in  the  hierarchy  described  in  Section  5.1  according  to 
the  formula  L  =  log 2(n/Cmax). 

Comment  [Create  the  dyadic  hierarchy  of  subblocks  of  the  matrix  S.  On  each  of  the  L  +  1 
levels  of  the  hierarchy,  there  are  2L  submatrices.  Retain  the  structure  created  for  use  in 
precomputation.] 
do  l  —  0, 1, . . . ,  L  —  1,  L 
doj  =  l,2,...,2'  — 1,2* 

do  k  =  1,2,  ...,2L~l  -  1,  2L~l 

Define  S(Qitk,  X )  to  be  the  submatrix  consisting  of  rows  (j  —  l)2m-*  +  l 
through  j2m~l  of  S  and  columns  (k  —  l)2m-i+*  +  1  through  k2m~L+l  of 
S. 

end  do 
end  do 
end  do 
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Precomputation  Step 


Precomputation 

Comment  [In  this  stage,  all  submatrices  S(Vt^k,  Xij)  of  S  are  compressed  using  the  inter¬ 
polate  decomposition  described  in  Section  3.1.2.] 
do  l  —  0, 1, . . . ,  L  —  1,  L 
do  J  =  1,  2, . . . ,  2Z  —  1,  2l 

do  k  =  1,2, ,  2L~l  -  1,  2L~l 
if  /  =  0  then 

1.  Compute  the  interpolate  decomposition 

S(nl>k,Xu)  =  B0tkPo,k-  (5.23) 

2.  Store  P0,fc- 

if  (/  >  0  and  j  is  odd)  then 

1.  Compute  the  interpolative  decomposition 

=  Bi,j,k  Pi,j,k-  (5-24) 

2.  Store  Pkj.k- 

3.  if  l  =  L  then  store  Bij^- 
if  (/  >  0  and  j  is  even)  then 

1.  Compute  the  interpolative  decomposition 

(-®z-i,L^J,2fc-i-^/-i,L^J,2fc)  —  Bi,j,kPi,j,k-  (5.25) 

2.  Store  Pi3.k. 

3.  if  l  =  L  then  store  Bij^. 

end  do 
end  do 
end  do 

Application  Step 

Application 

Comment  [Given  an  arbitrary  vector  a,  we  compute  a  vector  /  in  this  Step  such  that 
/  =  Sa  to  precision  e,  using  the  interpolative  decompositions  computed  in  Step  P.] 
do  l  —  0, 1, . . . ,  L  —  1,  L 
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do  j  =  1,2,  ...,2l  —  1,2* 

do  k  =  1,  2, . . . ,  2L~l  -  1,  2L~l 

if  l  =  0  then  calculate  and  store 


A),fc  —  Po,k  OiL,k- 

(5.26) 

if  l  >  0  then  calculate  and  store 

a  _  p  ( A-i,Li±U,2fc-i\ 

Pl,j,k  —  -T l,j,k  n 

\  k  J 

(5.27) 

if  /  =  L  then  calculate  and  store 

h,j  =  blj  PL,j- 

(5.28) 

end  do 
end  do 
end  do 

Comment  [The  vectors  /l,i,  /l, 2,  •  •  ■ ,  /l, 2L-h  fi. 2L  are  concatenated  to  form  the  vector  /. 
The  vector  /  satisfies  /  =  Sa  to  precision  e] 

5.5  An  adaptive  version  of  algorithm 

In  practice,  any  two  different  contiguous  submatrices  of  S  with  the  same  number  of  entries 
usually  have  slightly  different  ranks  to  precision  e  (see  Observation  5.2).  It  is  possible  to 
modify  the  algorithm  described  in  Sections  5.1  and  5.4  such  that,  for  a  given  positive  integer 
paramter  rmax,  every  submatrix  for  which  we  calculate  an  interpolative  decomposition  has 
rank  at  most  rmax.  Specifically,  if  the  submatrix  S(Qi:k,  Xij)  has  rank  to  precision  e  greater 
than  rmax,  we  do  not  compute  the  interpolative  decomposition  of  S(£lijk,  Xij)  but  partition 
S(Qi}kl  Xij)  into  its  top  half  and  its  bottom  half.  Similarly,  if  T  is  any  contigous  submatrix 
of  S  encountered  in  the  adaptive  dyadic  hierarchy  such  that  the  rank  of  S  is  greater  than 
t max?  we  do  not  compute  the  interpolative  decomposition  of  T,  but  partition  T  into  its  top 
half  and  its  bottom  half.  We  compute  the  interpolative  decompositions  of  those  contiguous 
submatrices  encountered  in  the  adaptive  dyadic  hieararchy  whose  numerical  ranks  are  at 
most  rmax.  Figure  9  illustrates  one  possible  partition  of  the  matrix  A  on  Level  1  of  the 
adaptive  algorithm. 

Observation  5.6  With  an  appropriate  choice  of  rmax,  we  have  not  yet  encountered  a  case  in 
which  the  adaptive  algorithm  applies  a  matrix  more  slowly  than  the  non-adaptive  algorithm. 
This  appears  to  be  due  to  a  combination  of  more  efficient  CPU  cache  usage  with  decreased 
complexity  of  the  algorithm. 

6  Numerical  Examples 

In  this  section,  we  describe  the  results  of  several  numerical  tests  of  the  algorithm  described 
in  Section  5.  In  the  examples,  we  use  the  adaptive  algorithm  described  in  Section  5.5.  We 
perform  all  computations  to  precision  e  =  10“10,  unless  specified  otherwise. 
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Figure  9:  One  possible  partition  of  A  on  level  1  of  the  adaptive  algorithm 


We  performed  all  computations  using  IEEE  standard  double-precision  variables,  whose 
mantissas  have  approximately  one  bit  of  precision  less  than  16  digits  (so  that  the  relative 
precision  of  the  variables  is  approximately  .2E-15).  We  ran  all  computations  on  one  core  of 
a  2.66  GHz  Intel  E6750  Core  Duo  microprocessor  with  4MB  of  L2  cache  and  4GB  of  RAM. 
We  compiled  the  Fortran  77  code  using  the  Lahey/Fujitsu  Linux  Express  v6.2  compiler,  with 
the  optimization  flag  — o2  enabled.  The  Lahey/Fujitsu  Express  v6.2  compiler  can  address 
only  about  2GB  of  RAM  per  array. 

The  columns  labeled  “n”  in  the  following  tables  list  the  size  of  the  matrix  to  which  the 
algorithm  described  in  Section  5  was  applied.  All  matrices  in  these  examples  are  square, 
unless  specified  otherwise.  The  columns  labeled  “Precomputation”  list  the  times  taken 
in  seconds  for  the  initialization  and  precomputation  steps  of  the  algorithm  described  in 
Section  5.  The  columns  labeled  “Direct  evaluation”  list  the  times  taken  in  seconds  for  a  direct 
matrix-vector  multiplication.  For  large  matrices,  the  times  taken  for  a  direct  matrix- vector 
multiplication  were  estimated  and  are  in  parentheses.  The  columns  labeled  “Fast  evaluation" 
list  the  times  taken  to  apply  the  matrix  using  the  algorithm  described  in  Section  5.  The 
columns  labeled  “Z2  error”  contain  the  relative  errors  between  the  solution  obtained  via 
the  algorithm  described  in  Section  5  and  the  solution  obtained  via  a  direct  matrix-vector 
multiplication.  The  columns  labeled  “MB  used”  list  the  amount  of  memory  in  megabytes 
required  by  the  algorithm  for  precomputation  and  evaluation.  In  these  examples,  we  apply 
each  matrix  with  n  columns  to  the  same  vector  =  t/ra)/||?/req|,  where  is  a  vector 
containing  n  independent  random  entries  chosen  uniformly  at  random  from  the  interval  [0, 1]. 

Remark  6.1  It  should  be  noted  that  no  effort  has  been  made  to  optimize  the  running  time 
of  the  precomputations  stage  in  the  butterfly  algorithm,  either  algorithmically  or  in  the 
implementation.  Thus,  while  the  times  listed  below  under  the  heading  “Fast  evaluation”  are 
a  reasonable  indication  of  the  algorithm’s  behavior,  those  listed  under  “Precompuatation" 
should  be  regarded  as  slower  than  necessary. 
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n 

Precomputation 

Direct  evaluation 

Fast  evaluation 

l2  error 

MB  used 

256 

.54E+00 

.85E-04 

.86E-04 

.69E-11 

.43E+00 

512 

.lOE+Ol 

.34E-03 

.26E-03 

.11E-10 

.13E+01 

1024 

.30E+01 

.17E-02 

.76E-03 

.lOE-10 

.34E+01 

2048 

.91E+01 

.68E-02 

.23E-02 

.80E-11 

.90E+01 

4096 

.31E+02 

.27E-01 

.58E-02 

.82E-11 

.22E+02 

8192 

.12E+03 

.11E+00 

.14E-01 

.92E-11 

.54E+02 

16384 

.51E+03 

.44E+00 

.33E-01 

.93E-11 

.13E+03 

32768 

.23E+04 

(.17E+01) 

.79E-01 

.92E-11 

.30E+03 

65536 

.10E+05 

(.70E+01) 

.18E+00 

.11E-10 

.70E+03 

131072 

.45E+05 

(.28E+02) 

.43E+00 

.13E-10 

.17E+04 

Table  1:  Times,  errors,  and  memory  usage  for  the  Legendre  transform  with  rmax  =  72. 
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Figure  10:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
the  Legendre  transform. 


6.1  The  Legendre  transform 

Table  1  and  Figure  10  display  the  results  of  applying  the  algorithm  described  in  Section  5  to 
the  change  of  basis  matrix  Tp  from  the  standard  basis  to  the  basis  of  Legendre  polynomials. 
We  chose  the  value  rmax  =  72  to  optimize  the  running  times  in  Table  1.  The  parameter  rmax 
is  described  in  Section  5.5. 

6.2  The  Laguerre  transform 

Table  2  and  Figure  11  display  the  results  of  applying  the  algorithm  of  Section  5  to  the 
change  of  basis  matrix  Tp  from  the  standard  basis  to  the  basis  of  Laguerre  polynomials.  We 
chose  the  value  rmax  =  83  to  optimize  the  running  times  in  Table  2.  The  parameter  rmax  is 
described  in  Section  5.5. 

6.3  The  Hermite  transform 

Table  3  and  Figure  12  display  the  results  of  applying  the  algorithm  of  Section  5  to  the 
change  of  basis  matrix  Tp  from  the  standard  basis  to  the  basis  of  Hermite  polynomials.  We 
chose  the  value  rmax  =  90  to  optimize  the  running  times  in  Table  3.  The  parameter  rmax  is 
described  in  Section  5.5. 

6.4  The  non-equispaced  Fourier  transform 

Table  4  and  Figure  13  display  the  results  of  applying  the  algorithm  described  in  Section  5  to 
the  matrix  Tp  defined  in  equation  (3.8),  where  the  nodes  x3  are  chosen  uniformly  at  random 
from  the  interval  [0,  2n]  and  the  frequencies  c Oj  are  chosen  uniformly  at  random  from  the 
interval  [—n,  n\.  We  chose  the  parameter  rmax  =  73  to  optimize  the  running  times  in  Table  4. 
The  parameter  rmax  is  described  in  Section  5.5. 

6.5  The  Fourier-Bessel  transform 

Tables  5-10  and  Figures  14-19  display  the  results  of  applying  the  algorithm  described  in 
Section  5  to  the  matrix  T£  defined  in  equation  (3.44)  where  the  real  numbers  y3  are  the 
Gaussian  quadrature  nodes  associated  with  Legendre  polynomials  on  the  interval  [0, 1]  de¬ 
fined  in  Formula  25.4.30  in  [1].  The  value  of  R  used  in  the  definition  of  the  function  J 
in  (3.26)  is  R  =  1. 

In  Table  5,  we  chose  the  value  rmax  =  93  to  optimize  the  running  time  of  the  algorithm 
described  in  Section  5  for  the  application  of  the  Fourier-Bessel  series  transform  of  order 
m  =  n/4.  This  same  value,  rmax  =  93,  is  used  in  Tables  5-10.  The  parameter  rmax  is 
described  in  Section  5.5. 
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n 

Precomputation 

Direct  evaluation 

Fast  evaluation 

l2  error 

MB  used 

256 

.52E+00 

.85E-04 

.83E-04 

.18E-10 

.42E+00 

512 

.12E+01 

.34E-03 

.24E-03 

.32E-10 

.12E+01 

1024 

.40E+01 

.17E-02 

.70E-03 

.13E-09 

.33E+01 

2048 

.15E+02 

.68E-02 

.23E-02 

.15E-09 

.88E+01 

4096 

.66E+02 

.27E-01 

.56E-02 

.17E-09 

.21E+02 

8192 

.30E+03 

.11E+00 

.13E-01 

.23E-09 

.52E+02 

16384 

.14E+04 

.44E+00 

.32E-01 

.31E-09 

.12E+03 

32768 

.62E+04 

(.17E+01) 

.75E-01 

.39E-09 

.29E+03 

65536 

.27E+05 

(.70E+01) 

.17E+00 

.58E-09 

.68E+03 

131072 

.12E+06 

(.28E+02) 

.42E+00 

.80E-09 

.16E+04 

Tabic  2:  Times,  errors,  and  memory  usage  for  the  Laguerre  transform  with  rmax  =  83. 
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Figure  11:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
the  Laguerre  transform. 
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n 

Precomputation 

Direct  evaluation 

Fast  evaluation 

l2  error 

MB  used 

256 

.55E+00 

.85E-04 

.88E-04 

.38E-11 

.45E+00 

512 

.13E+01 

.34E-03 

.25E-03 

.15E-10 

.13E+01 

1024 

.42E+01 

.17E-02 

.71E-03 

.22E-10 

.34E+01 

2048 

.16E+02 

.68E-02 

.23E-02 

.28E-10 

.89E+01 

4096 

.70E+02 

.27E-01 

.57E-02 

.33E-10 

.22E+02 

8192 

.31E+03 

.11E+00 

.13E-01 

.34E-10 

.53E+02 

16384 

.15E+04 

.44E+00 

.32E-01 

.38E-10 

.13E+03 

32768 

.67E+04 

(.17E+01) 

.75E-01 

.46E-10 

.30E+03 

65536 

.30E+05 

(.70E+01) 

.17E+00 

.51E-10 

.69E+03 

131072 

.13E+06 

(.28E+02) 

.40E+00 

.58E-10 

.16E+04 

Table  3:  Times,  errors,  and  memory  usage  for  the  Hermite  transform  with  rmax  =  90. 
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Figure  12:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
the  Hermite  transform. 
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n 

Precomputation 

Direct  evaluation 

Fast  evaluation 

l 2  error 

MB  used 

256 

.42E+00 

.21E-03 

.21E-03 

.76E-09 

.93E+00 

512 

.11E+01 

.88E-03 

.59E-03 

.28E-09 

.25E+01 

1024 

.36E+01 

.38E-02 

.18E-02 

.17E-08 

.64E+01 

2048 

.49E+02 

.15E-01 

.45E-02 

.26E-09 

.16E+02 

4096 

.50E+02 

.61E-01 

.11E-01 

.10E-08 

.38E+02 

8192 

.20E+03 

.24E+00 

.25E-01 

.15E-08 

.87E+02 

16384 

.86E+03 

(.97E+00) 

.58E-01 

.22E-08 

.20E+03 

32768 

.36E+04 

(.39E+01) 

.13E+00 

.13E-08 

.45E+03 

65536 

.15E+05 

(.19E+02) 

.30E+00 

.10E-08 

.10E+04 

Table  4:  Times,  errors,  and  memory  usage  for  the  non-equispaced  Fourier  transform  with 
^max  73. 
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Figure  13:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
the  non-equispaced  Fourier  transform. 
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n 

f -m-C 

Precomputation 

Direct 

evaluation 

Fast 

evaluation 

l2  error 

MB  used 

512 

118 

.10E+01 

.77E-04 

.58E-04 

.61E-11 

.28E+00 

1024 

246 

.19E+01 

.33E-03 

.16E-03 

.87E-11 

.76E+00 

2048 

502 

.45E+01 

.17E-02 

.44E-03 

.13E-10 

.21E+01 

4096 

1014 

.11E+02 

.69E-02 

.13E-02 

.15E-10 

.52E+01 

8192 

2038 

.29E+02 

.28E-01 

.33E-02 

.17E-10 

.13E+02 

16384 

4086 

.91E+02 

(.11E+00) 

.79E-02 

.21E-10 

.31E+02 

32768 

8182 

.32E+03 

(.45E+00) 

.18E-01 

.22E-10 

.73E+02 

65536 

16374 

.13E+04 

(.18E+01) 

.42E-01 

.23E-10 

.17E+03 

131072 

32758 

.55E+04 

(.72E+01) 

.95E-01 

.28E-10 

.41E+03 

Tabic  5:  Times,  errors,  and  memory  usage  for  calculating  the  first  n/2  —  m  —  10  coefficients 
in  the  Fourier-Bessel  expansion  of  order  m  =  n/4,  discretized  at  n  nodes,  that  is,  applying 
the  real  |  —  m  —  C  x  n  matrix  T™  defined  in  equation  (3.44)  with  rmax  =  93  and  C  =  10. 
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Figure  14:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating  the 
first  n/2  —  m  —  10  coefficients  in  the  Fourier-Bessel  expansion  of  order  m  =  n/4,  discretized 
at  n  nodes,  that  is,  applying  the  real  |  —  m  —  C  x  n  matrix  defined  in  equation  (3.44) 
with  rmax  =  93  and  C  =  10. 
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n 

f -m-C 

Precomputation 

Direct 

evaluation 

Fast 

evaluation 

l2  error 

MB  used 

512 

246 

.10E+01 

.24E-03 

.12E-03 

.67E-11 

.61E+00 

1024 

502 

.21E+01 

.73E-03 

.35E-03 

.68E-11 

.17E+01 

2048 

1014 

.56E+01 

.34E-02 

.11E-02 

.35E-11 

.47E+01 

4096 

2038 

.16E+02 

.14E-01 

.30E-02 

.46E-11 

.12E+02 

8192 

4086 

.53E+02 

.55E-01 

.72E-02 

.66E-11 

.29E+02 

16384 

8182 

.20E+03 

(.22E+00) 

.17E-01 

.92E-11 

.69E+02 

32768 

16374 

.83E+03 

(.88E+00) 

.39E-01 

.76E-11 

.16E+03 

65536 

32758 

.37E+04 

(.35E+01) 

.90E-01 

.49E-10 

.38E+03 

131072 

65526 

.17E+05 

(.14E+02) 

.21E+00 

.10E-09 

.89E+03 

Table  6:  Times,  errors,  and  memory  usage  for  calculating  the  first  n/2  —  10  coefficients  in 
the  Fourier-Bessel  expansion  or  order  m  —  0,  discretized  at  n  nodes,  that  is,  applying  the 
real  ^  —  C  x  n  matrix  Tq  defined  in  equation  (3.44)  with  rmax  =  93  and  C  —  10. 
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Figure  15:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
the  first  n/2  —  10  coefficients  in  the  Fourier-Bessel  expansion  of  order  m  =  0,  discretized  at 
n  nodes,  that  is,  applying  the  real  |  —  m  —  C  x  n  matrix  T/1  defined  in  equation  (3.44)  with 
t max  =  93  and  C  =  10. 
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n 

§ -m-C 

Precomputation 

Direct 

evaluation 

Fast 

evaluation 

l 2  error 

MB  used 

512 

245 

.13E+01 

.24E-03 

.12E-03 

.46E-11 

.60E+00 

1024 

501 

.26E+01 

.73E-03 

.35E-03 

.15E-10 

.17E+01 

2048 

1013 

.66E+01 

.34E-02 

.11E-02 

.56E-11 

.47E+01 

4096 

2037 

.18E+02 

.14E-01 

.30E-02 

.90E-11 

.12E+02 

8192 

4085 

.57E+02 

.55E-01 

.72E-02 

.10E-10 

.29E+02 

16384 

8181 

.21E+03 

(.22E+00) 

.17E-01 

.lOE-10 

.69E+02 

32768 

16373 

.85E+03 

(.88E+00) 

.39E-01 

.78E-11 

.16E+03 

65536 

32757 

.38E+04 

(.35E+01) 

.90E-01 

.79E-09 

.38E+03 

131072 

65525 

.17E+05 

(.14E+02) 

.21E+00 

.42E-09 

.90E+03 

Table  7:  Times,  errors,  and  memory  usage  for  calculating  the  first  n/2  —  11  coefficients  in 
the  Fourier-Bessel  expansion  of  order  m  =  1,  discretized  at  n  nodes,  that  is,  applying  the 
real  ^  —  1  —  C  x  n  matrix  7/n  defined  in  equation  (3.44)  with  rmax  =  93  and  C  =  10. 
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Figure  16:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
the  first  n/2  —  11  coefficients  in  the  Fourier-Bessel  expansion  of  order  m  =  1,  discretized  at 
n  nodes,  that  is,  applying  the  real  |  —  m  —  C  x  n  matrix  T[l  defined  in  equation  (3.44)  with 
t max  =  93  and  C  =  10. 
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n 

f -m-C 

Precomputation 

Direct 

evaluation 

Fast 

evaluation 

l2  error 

MB  used 

512 

146 

.10E+01 

.95E-04 

.69E-04 

.77E-11 

.34E+00 

1024 

402 

.22E+01 

.54E-03 

.27E-03 

.13E-10 

.13E+01 

2048 

914 

.61E+01 

.30E-02 

.97E-03 

.15E-10 

.42E+01 

4096 

1938 

.19E+02 

.13E-01 

.28E-02 

.15E-10 

.11E+02 

8192 

3986 

.69E+02 

.53E-01 

.70E-02 

.15E-10 

.28E+02 

16384 

8082 

.27E+03 

(.21E+00) 

.17E-01 

.22E-10 

.68E+02 

32768 

16274 

.11E+04 

(.87E+00) 

.39E-01 

.23E-10 

.16E+03 

65536 

32658 

.51E+04 

(.35E+01) 

.90E-01 

.17E-10 

.38E+03 

131072 

65426 

.22E+05 

(.14E+02) 

.21E+00 

.22E-10 

.90E+03 

Table  8:  Times,  errors,  and  memory  usage  for  calculating  the  first  n/2  —  110  coefficients  in 
the  Fourier-Bessel  expansion  of  order  m  =  100,  discretized  at  n  nodes,  that  is,  applying  the 
real  |  —  100  —  C  x  n  matrix  T™00  defined  in  equation  (3.44)  with  rmax  =  93  and  C  =  10. 
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Figure  17:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
the  first  n/2  —  110  coefficients  in  the  Fourier-Bessel  expansion  of  order  m  =  100,  discretized 
at  n  nodes,  that  is,  applying  the  real  |  —  m  —  C  x  n  matrix  defined  in  equation  (3.44) 
with  rmax  =  93  and  C  =  10. 
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n 

f -m-C 

£ 

Precomputation 

Direct 

evaluation 

Fast 

evaluation 

l2  error 

MB  used 

8192 

4086 

o 

1 

K— 

.44E+02 

.55E-01 

.55E-02 

.10E-04 

.22E+02 

8192 

4086 

10“6 

.47E+02 

.55E-01 

.62E-02 

.17E-06 

.25E+02 

8192 

4086 

10“8 

.50E+02 

.55E-01 

.67E-02 

.13E-08 

.27E+02 

8192 

4086 

o 

1 

O 

.53E+02 

.55E-01 

.73E-02 

.66E-11 

.29E+02 

8192 

4086 

O 

1 

to 

.54E+02 

.55E-01 

.82E-02 

.91E-13 

.32E+02 

8192 

4086 

10~14 

.11E+03 

.55E-01 

.54E-01 

.49E-14 

.21E+03 

8192 

4086 

10"16 

.13E+03 

.55E-01 

.68E-01 

.46E-14 

.27E+03 

Tabic  9:  Times,  errors,  and  memory  usage  for  calculating  the  first  n/2  —  10  coefficients  in 
the  Fourier- Bessel  expansion  of  order  m  =  0,  discretizing  at  n  nodes,  that  is,  applying  the 
real  |  -Cxn  matrix  T/1  defined  in  equation  (3.44)  with  rmax  =  93,  C  =  10,  n  =  8192,  and 
various  precisions  e. 
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Figure  18:  Times  and  errors  for  calculating  the  first  n/2  —  10  coefficients  in  the  Fourier- 
Bessel  expansion  of  order  m  —  0,  discretizing  at  n  nodes,  that  is,  applying  the  real  |  —  C  x  n 
matrix  Tq  defined  in  equation  (3.44)  with  rmax  =  93,  C  =  10,  n  =  8192,  and  various 
precisions  e. 
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n  |  —  m  —  C  e  Precomputation 


16384 

8182 

I—1 

o 

1 

K- 

.16E+03 

16384 

8182 

10"6 

.17E+03 

16384 

8182 

o 

i 

00 

.18E+03 

16384 

8182 

O 

1 

o 

.20E+03 

16384 

8182 

O 

1 

to 

.23E+03 

16384 

8182 

O 

1 

.52E+03 

16384 

8182 

10-16 

.59E+03 

Direct 

evaluation 

Fast 

evaluation 

l 2  error 

MB  used 

(.22E+00) 

.13E-01 

.11E-04 

.22E+02 

(.22E+00) 

.14E-01 

.85E-07 

.25E+02 

(.22E+00) 

.16E-01 

.92E-09 

.27E+02 

(.22E+00) 

.17E-01 

.92E-11 

.29E+02 

(.22E+00) 

.26E-01 

.13E-12 

.32E+02 

(.22E+00) 

.23E+00 

.57E-14 

.21E+03 

(.22E+00) 

.27E+00 

.55E-14 

.27E+03 

Table  10:  Times,  errors,  and  memory  usage  for  calculating  the  first  n/2  —  10  coefficients  in 
the  Fourier- Bessel  expansion  of  order  m  =  0,  discretizing  at  n  nodes,  that  is,  applying  the 
real  |  —  C  x  n  matrix  Tq  defined  in  equation  (3.44)  with  rmax  =  93,  C  =  10,  n  =  16384,  and 
various  precisions  e. 


Precision  e 

Figure  19:  Times  and  errors  for  calculating  the  first  n/2  —  10  coefficients  in  the  Fourier- 
Bessel  expansion  of  order  m  =  0,  discretizing  at  n  nodes,  that  is,  applying  the  real  ^-dxn 
matrix  Tq  defined  in  equation  (3.44)  with  rmax  =  93 ,  C  —  10,  n  =  16384,  and  various 
precisions  e. 
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n 

Precomputation 

Direct  evaluation 

Fast  evaluation 

l2  error 

MB  used 

256 

.32E+00 

.85E-04 

.42E-04 

.17E-10 

.21E+00 

512 

.65E+00 

.34E-03 

.99E-04 

.40E-10 

.50E+00 

1024 

.21E+01 

.17E-02 

.25E-03 

.28E-10 

.13E+01 

2048 

.97E+01 

.68E-02 

.59E-03 

.42E-10 

.30E+01 

4096 

.58E+02 

.27E-01 

.16E-02 

.47E-10 

.67E+01 

8192 

.42E+03 

.11E+00 

.36E-02 

.47E-10 

.15E+02 

16384 

.32E+04 

.44E+00 

.79E-02 

.52E-10 

.34E+02 

Tabic  11:  Times,  errors,  and  memory  usage  for  evaluating  Bessel  function  expansions  by 
applying  the  matrix  Ej  defined  in  equation  (3.11),  with  rmax  =  86. 
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Figure  20:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
Bessel  function  expansions  by  applying  the  matrix  Ej  defined  in  equation  (3.11). 
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n 

Precomputation 

Direct  evaluation 

Fast  evaluation 

l2  error 

MB  used 

256 

.75E+00 

.21E-03 

.61E-04 

.37E-10 

.24E+00 

512 

.14E+01 

.88E-03 

.14E-03 

.35E-10 

.53E+00 

1024 

.34E+01 

.38E-02 

.32E-03 

.38E-10 

.12E+01 

2048 

.86E+01 

.15E-01 

.70E-03 

.50E-10 

.27E+01 

4096 

.28E+02 

.61E-01 

.18E-02 

.62E-10 

.62E+01 

8192 

.13E+03 

.24E+00 

.36E-02 

.54E-10 

.15E+02 

16384 

.77E+03 

(.97E+00) 

.79E-02 

.59E-10 

.37E+02 

Table  12:  Times,  errors,  and  memory  usage  for  evaluating  Hankel  function  expansions  by 
applying  the  matrix  defined  in  equation  (3.12),  with  rmax  =  38. 
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Figure  21:  Comparison  of  the  algorithm  of  Section  5  with  direct  calculation  for  evaluating 
Hankel  function  expansions  by  applying  the  matrix  E ^  defined  in  equation  (3.12). 


6.6  Sums  of  Bessel  and  Hankel  functions 

Table  11  and  Figure  20  display  the  results  of  applying  the  algorithm  described  in  Section  5 
to  the  matrix  Ej  defined  in  equation  (3.11)  where 

Xj  =  n  +  (6-1) 

for  each  integer  j  such  that  1  <  j  <  n.  We  chose  the  value  rmax  =  86  to  optimize  the  running 
times  in  Table  11. 
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Table  12  and  Figure  21  display  the  results  of  applying  the  algorithm  described  in  Section  5 
to  the  matrix  E ^  defined  in  equation  (3.12),  where  the  nodes  x3  are  defined  in  (6.1).  We 
chose  the  value  rmax  =  38  to  optimize  the  running  times  in  Table  12.  The  parameter  rmax  is 
described  in  Section  5.5. 


7  Conclusions  and  further  work 

We  have  presented  an  algorithm  for  the  numerical  computation  of  several  special  function 
transforms  with  asymptotic  running  time  0{n\og{n)),  and  asymptotic  precomputation  cost 
0(n2).  These  running  times  have  been  proven  in  the  case  of  the  Fourier-Bessel  transform. 
Numerical  examples  demonstrate  a  much  wider  applicability;  analysis  of  these  cases  is  in 
progress  and  will  be  reported  at  a  later  date.  An  implementation  of  the  algorithm  described 
in  Section  5  for  the  acceleration  of  spherical  harmonic  transforms  is  currently  under  devel¬ 
opment. 
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